• Steven Ponce
  • About
  • Data Visualizations
  • Projects
  • Resume
  • Email

On this page

  • Steps to Create this Graphic
    • 1. Load Packages & Setup
    • 2. Read in the Data
    • 3. Examine the Data
    • 4. Tidy Data
    • 5. Visualization Parameters
    • 6. Plot
    • 7. Save
    • 8. Session Info
    • 9. GitHub Repository
    • 10. References

Uncertainty in Cardiovascular Disease Prevalence Across the U.S.

  • Show All Code
  • Hide All Code

  • View Source

95% confidence intervals for states and territories reveal varying levels of statistical precision (CDC, 2023)

30DayChartChallenge
Data Visualization
R Programming
2025
A visualization of cardiovascular disease prevalence across U.S. states and territories, highlighting statistical uncertainty through confidence intervals. This chart demonstrates how sample sizes affect the precision of health statistics, with wider intervals indicating less certainty in the estimate.
Published

April 26, 2025

Figure 1: A dot plot showing cardiovascular disease prevalence across U.S. states and territories with 95% confidence intervals. States are ordered vertically from highest to lowest prevalence (West Virginia at 6.7% to Virgin Islands at 1.4%). Each state/territory has a dot showing its estimate and a horizontal line extending from the dot showing the confidence interval range. A vertical dashed line marks the national median of 4.0%. The visualization demonstrates how statistical uncertainty varies, with some regions having wider confidence intervals than others.

Steps to Create this Graphic

1. Load Packages & Setup

Show code
## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
pacman::p_load(
  tidyverse,      # Easily Install and Load the 'Tidyverse'
  ggtext,         # Improved Text Rendering Support for 'ggplot2'
  showtext,       # Using Fonts More Easily in R Graphs
  janitor,        # Simple Tools for Examining and Cleaning Dirty Data
  skimr,          # Compact and Flexible Summaries of Data
  scales,         # Scale Functions for Visualization
  lubridate,      # Make Dealing with Dates a Little Easier
  ggrepel,        # Automatically Position Non-Overlapping Text Labels with 'ggplot2'
  camcorder       # Record Your Plot History
  )
})

### |- figure size ----
gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 8,
    height = 10,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))

2. Read in the Data

Show code
cdc_prevalence_cardiovascular_disease <- read_csv(
  here::here(
    "data/Behavioral_Risk_Factor_Surveillance_System__BRFSS__Prevalence_Data__2011_to_present__20250409.csv")) |> 
  clean_names()

3. Examine the Data

Show code
glimpse(cdc_prevalence_cardiovascular_disease)
skim(cdc_prevalence_cardiovascular_disease)

4. Tidy Data

Show code
### |- Tidy ----
# Prepare state-level data
state_cvd <- cdc_prevalence_cardiovascular_disease |>
  filter(year == 2023) |>
  filter(break_out == "Overall" | is.na(break_out)) |>
  filter(
    str_detect(locationdesc, "^[A-Z]") & 
      !str_detect(locationdesc, "median|Median|average|Average")
  ) |>
  select(
    location = locationdesc,
    prevalence = data_value,
    lower_bound = confidence_limit_low,
    upper_bound = confidence_limit_high,
    sample_size,
    year
  ) |>
  mutate(
    ci_width = upper_bound - lower_bound,
    location = str_replace(location, " State$", ""),
    location = fct_reorder(location, prevalence)
  ) |>
  filter(!is.na(prevalence))

# Get national median estimate (US-level)
us_median <- cdc_prevalence_cardiovascular_disease |>
  filter(year == 2023, break_out == "Overall", locationabbr == "US") |> 
  pull(data_value)

5. Visualization Parameters

Show code
### |-  plot aesthetics ----
colors <- get_theme_colors(
  palette = c(
    "black", "gray40", "gray50", "gray70", "gray95", "white", "gray40"
    )
  )

### |-  titles and caption ----
# text
title_text    <- str_wrap("Uncertainty in Cardiovascular Disease Prevalence Across the U.S.",
                          width = 70) 

subtitle_text <- str_wrap("95% confidence intervals for states and territories reveal varying levels of statistical precision (CDC, 2023)",
                          width = 80)

caption_text <- create_dcc_caption(
  dcc_year = 2025,
  dcc_day = 26,
  source_text =  "CDC’s Behavioral Risk Factor Surveillance System (BRFSS)" 
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    
    # Text styling 
    plot.title = element_text(face = "bold", family = fonts$title, size = rel(1.14), margin = margin(b = 10)),
    plot.subtitle = element_text(family = fonts$subtitle, color = colors$text, size = rel(0.78), margin = margin(b = 20)),
    
    # Axis elements
    axis.text = element_text(color = colors$text, size = rel(0.7)),
    axis.title.y = element_blank(),
    axis.title.x = element_text(color = colors$text, size = rel(0.8), 
                                hjust = 0.5, margin = margin(t = 10)),
    
    axis.line.x = element_line(color = "gray50", linewidth = .2),
    
    # Grid elements
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_line(color = "gray65", linewidth = 0.05),
    panel.grid.major.x = element_line(color = "gray65", linewidth = 0.05),
    
    # Plot margins 
    plot.background = element_rect(fill = colors$palette[6], color = colors$palette[6]),
    panel.background = element_rect(fill = colors$palette[6], color = colors$palette[6]),
    plot.margin = margin(t = 10, r = 20, b = 10, l = 20),
  )
)

# Set theme
theme_set(weekly_theme)

6. Plot

Show code
### |-  Plot ----
p <- ggplot(state_cvd, aes(x = prevalence, y = location)) +
  # Geoms
  geom_segment(
    aes(x = lower_bound, xend = upper_bound, yend = location),
    color = colors$palette[4], linewidth = 0.9, alpha = 0.9
  ) +
  geom_point(
    size = 3, color = colors$palette[1], 
    fill = colors$palette[1],
  ) +
  geom_text(
    aes(x = upper_bound, label = sprintf("%.1f%% [%.1f–%.1f]", prevalence, lower_bound, upper_bound)),
    nudge_x = 0.25, hjust = 0, vjust = 0.3, size = 2.7, color = colors$palette[2]
  ) +
  geom_vline(
    xintercept = us_median, color = colors$palette[2], 
    linetype = "dashed", linewidth = 0.5
  ) + 
  # Annotate
  annotate(
    "text", x = us_median, y = 44, 
    label = sprintf("National median: %.1f%%", us_median),
    hjust = 0, vjust = -1, size = 3, color = colors$palette[1], 
    fontface = "italic", angle = 90
  ) +
  # Scales
  scale_x_continuous(
    limits = c(0, max(state_cvd$upper_bound) + 0.7),
    labels = label_percent(scale = 1, suffix = "%"),
    breaks = seq(0, 8, by = 2),
    expand = expansion(mult = c(0.01, 0.1))
  ) +
  scale_y_discrete(
    expand = expansion(mult = c(0.01, 0.02))
  ) +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    x = "Cardiovascular Disease Prevalence (%)",
    y = NULL
  ) +
  # Theme
  theme(
    plot.title = element_text(
      size = rel(1.45),
      family = fonts$title,
      face = "bold",
      color = colors$title,
      margin = margin(t = 5, b = 5)
    ),
    plot.subtitle = element_text(
      size = rel(0.95),
      family = fonts$subtitle,
      color = colors$subtitle,
      lineheight = 1.1,
      margin = margin(t = 5, b = 14)
    ),
    plot.caption = element_markdown(
      size = rel(0.65),
      family = fonts$caption,
      color = colors$caption,
      lineheight = 0.65,
      hjust = 0.5,
      halign = 0.5,
      margin = margin(t = 10, b = 5)
    ),
  ) 

7. Save

Show code
### |-  plot image ----  
save_plot(
  p, 
  type = "30daychartchallenge", 
  year = 2025, 
  day = 26, 
  width = 8, 
  height = 10
  )

8. Session Info

Expand for Session Info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] here_1.0.1      camcorder_0.1.0 hoopR_2.1.0     ggrepel_0.9.6  
 [5] scales_1.3.0    skimr_2.1.5     janitor_2.2.0   showtext_0.9-7 
 [9] showtextdb_3.0  sysfonts_0.8.9  ggtext_0.1.2    lubridate_1.9.3
[13] forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2    
[17] readr_2.1.5     tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1  
[21] tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1    farver_2.1.2        fastmap_1.2.0      
 [4] pacman_0.5.1        promises_1.3.0      digest_0.6.37      
 [7] timechange_0.3.0    lifecycle_1.0.4     rsvg_2.6.1         
[10] processx_3.8.4      magrittr_2.0.3      compiler_4.4.0     
[13] rlang_1.1.6         tools_4.4.0         utf8_1.2.4         
[16] yaml_2.3.10         data.table_1.16.2   knitr_1.49         
[19] labeling_0.4.3      htmlwidgets_1.6.4   curl_6.0.0         
[22] xml2_1.3.6          repr_1.1.7          websocket_1.4.2    
[25] withr_3.0.2         grid_4.4.0          fansi_1.0.6        
[28] colorspace_2.1-1    future_1.34.0       globals_0.16.3     
[31] cli_3.6.4           rmarkdown_2.29      ragg_1.3.3         
[34] generics_0.1.3      RcppParallel_5.1.10 rstudioapi_0.17.1  
[37] httr_1.4.7          tzdb_0.5.0          commonmark_1.9.2   
[40] chromote_0.4.0      rvest_1.0.4         parallel_4.4.0     
[43] base64enc_0.1-3     vctrs_0.6.5         jsonlite_1.8.9     
[46] hms_1.1.3           listenv_0.9.1       systemfonts_1.1.0  
[49] magick_2.8.5        glue_1.8.0          parallelly_1.43.0  
[52] gifski_1.32.0-1     codetools_0.2-20    ps_1.8.1           
[55] stringi_1.8.4       gtable_0.3.6        later_1.3.2        
[58] munsell_0.5.1       furrr_0.3.1         pillar_1.9.0       
[61] htmltools_0.5.8.1   R6_2.5.1            textshaping_0.4.0  
[64] rprojroot_2.0.4     evaluate_1.0.1      markdown_1.13      
[67] gridtext_0.1.5      snakecase_0.11.1    renv_1.0.3         
[70] Rcpp_1.0.13-1       svglite_2.1.3       xfun_0.49          
[73] pkgconfig_2.0.3    

9. GitHub Repository

Expand for GitHub Repo

The complete code for this analysis is available in 30dcc_2025_26.qmd.

For the full repository, click here.

10. References

Expand for References
  1. Data Sources:
    • CDC’s Behavioral Risk Factor Surveillance System (BRFSS) BRFSS: Prevalence of Cardiovascular Disease
Back to top
Source Code
---
title: "Uncertainty in Cardiovascular Disease Prevalence Across the U.S."
subtitle: "95% confidence intervals for states and territories reveal varying levels of statistical precision (CDC, 2023)"
description: "A visualization of cardiovascular disease prevalence across U.S. states and territories, highlighting statistical uncertainty through confidence intervals. This chart demonstrates how sample sizes affect the precision of health statistics, with wider intervals indicating less certainty in the estimate."
date: "2025-04-26" 
categories: ["30DayChartChallenge", "Data Visualization", "R Programming", "2025"]
tags: [
"uncertainty",
"public health",
"confidence intervals",
"cardiovascular disease",
"CDC",
"BRFSS",
"statistical precision",
"monochrome",
"ggplot2",
"epidemiology"
  ]
image: "thumbnails/30dcc_2025_26.png"
format:
  html:
    toc: true
    toc-depth: 5
    code-link: true
    code-fold: true
    code-tools: true
    code-summary: "Show code"
    self-contained: true
    theme: 
      light: [flatly, assets/styling/custom_styles.scss]
      dark: [darkly, assets/styling/custom_styles_dark.scss]
editor_options: 
  chunk_output_type: inline
execute: 
  freeze: true                                                  
  cache: true                                                   
  error: false
  message: false
  warning: false
  eval: true
# filters:
#   - social-share
# share:
#   permalink: "https://stevenponce.netlify.app/data_visualizations/30DayChartChallenge/2025/30dcc_2025_26.html"
#   description: "Day 26 of #30DayChartChallenge: Uncertainty in Cardiovascular Disease Prevalence. This visualization shows how confidence intervals reveal varying levels of statistical precision in public health data across U.S. states and territories. #dataviz #rstats #publichealth"
#   twitter: true
#   linkedin: true
#   email: true
#   facebook: false
#   reddit: false
#   stumble: false
#   tumblr: false
#   mastodon: true
#   bsky: true
---

![A dot plot showing cardiovascular disease prevalence across U.S. states and territories with 95% confidence intervals. States are ordered vertically from highest to lowest prevalence (West Virginia at 6.7% to Virgin Islands at 1.4%). Each state/territory has a dot showing its estimate and a horizontal line extending from the dot showing the confidence interval range. A vertical dashed line marks the national median of 4.0%. The visualization demonstrates how statistical uncertainty varies, with some regions having wider confidence intervals than others.](30dcc_2025_26.png){#fig-1}

### <mark> **Steps to Create this Graphic** </mark>

#### 1. Load Packages & Setup

```{r}
#| label: load
#| warning: false
#| message: false      
#| results: "hide"     

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
pacman::p_load(
  tidyverse,      # Easily Install and Load the 'Tidyverse'
  ggtext,         # Improved Text Rendering Support for 'ggplot2'
  showtext,       # Using Fonts More Easily in R Graphs
  janitor,        # Simple Tools for Examining and Cleaning Dirty Data
  skimr,          # Compact and Flexible Summaries of Data
  scales,         # Scale Functions for Visualization
  lubridate,      # Make Dealing with Dates a Little Easier
  ggrepel,        # Automatically Position Non-Overlapping Text Labels with 'ggplot2'
  camcorder       # Record Your Plot History
  )
})

### |- figure size ----
gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 8,
    height = 10,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

#### 2. Read in the Data

```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false

cdc_prevalence_cardiovascular_disease <- read_csv(
  here::here(
    "data/Behavioral_Risk_Factor_Surveillance_System__BRFSS__Prevalence_Data__2011_to_present__20250409.csv")) |> 
  clean_names()
```

#### 3. Examine the Data

```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(cdc_prevalence_cardiovascular_disease)
skim(cdc_prevalence_cardiovascular_disease)
```

#### 4. Tidy Data

```{r}
#| label: tidy
#| warning: false

### |- Tidy ----
# Prepare state-level data
state_cvd <- cdc_prevalence_cardiovascular_disease |>
  filter(year == 2023) |>
  filter(break_out == "Overall" | is.na(break_out)) |>
  filter(
    str_detect(locationdesc, "^[A-Z]") & 
      !str_detect(locationdesc, "median|Median|average|Average")
  ) |>
  select(
    location = locationdesc,
    prevalence = data_value,
    lower_bound = confidence_limit_low,
    upper_bound = confidence_limit_high,
    sample_size,
    year
  ) |>
  mutate(
    ci_width = upper_bound - lower_bound,
    location = str_replace(location, " State$", ""),
    location = fct_reorder(location, prevalence)
  ) |>
  filter(!is.na(prevalence))

# Get national median estimate (US-level)
us_median <- cdc_prevalence_cardiovascular_disease |>
  filter(year == 2023, break_out == "Overall", locationabbr == "US") |> 
  pull(data_value)
```

#### 5. Visualization Parameters

```{r}
#| label: params
#| include: true
#| warning: false

### |-  plot aesthetics ----
colors <- get_theme_colors(
  palette = c(
    "black", "gray40", "gray50", "gray70", "gray95", "white", "gray40"
    )
  )

### |-  titles and caption ----
# text
title_text    <- str_wrap("Uncertainty in Cardiovascular Disease Prevalence Across the U.S.",
                          width = 70) 

subtitle_text <- str_wrap("95% confidence intervals for states and territories reveal varying levels of statistical precision (CDC, 2023)",
                          width = 80)

caption_text <- create_dcc_caption(
  dcc_year = 2025,
  dcc_day = 26,
  source_text =  "CDC’s Behavioral Risk Factor Surveillance System (BRFSS)" 
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# Start with base theme
base_theme <- create_base_theme(colors)

# Add weekly-specific theme elements
weekly_theme <- extend_weekly_theme(
  base_theme,
  theme(
    
    # Text styling 
    plot.title = element_text(face = "bold", family = fonts$title, size = rel(1.14), margin = margin(b = 10)),
    plot.subtitle = element_text(family = fonts$subtitle, color = colors$text, size = rel(0.78), margin = margin(b = 20)),
    
    # Axis elements
    axis.text = element_text(color = colors$text, size = rel(0.7)),
    axis.title.y = element_blank(),
    axis.title.x = element_text(color = colors$text, size = rel(0.8), 
                                hjust = 0.5, margin = margin(t = 10)),
    
    axis.line.x = element_line(color = "gray50", linewidth = .2),
    
    # Grid elements
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_line(color = "gray65", linewidth = 0.05),
    panel.grid.major.x = element_line(color = "gray65", linewidth = 0.05),
    
    # Plot margins 
    plot.background = element_rect(fill = colors$palette[6], color = colors$palette[6]),
    panel.background = element_rect(fill = colors$palette[6], color = colors$palette[6]),
    plot.margin = margin(t = 10, r = 20, b = 10, l = 20),
  )
)

# Set theme
theme_set(weekly_theme)
```

#### 6. Plot

```{r}
#| label: plot
#| warning: false

### |-  Plot ----
p <- ggplot(state_cvd, aes(x = prevalence, y = location)) +
  # Geoms
  geom_segment(
    aes(x = lower_bound, xend = upper_bound, yend = location),
    color = colors$palette[4], linewidth = 0.9, alpha = 0.9
  ) +
  geom_point(
    size = 3, color = colors$palette[1], 
    fill = colors$palette[1],
  ) +
  geom_text(
    aes(x = upper_bound, label = sprintf("%.1f%% [%.1f–%.1f]", prevalence, lower_bound, upper_bound)),
    nudge_x = 0.25, hjust = 0, vjust = 0.3, size = 2.7, color = colors$palette[2]
  ) +
  geom_vline(
    xintercept = us_median, color = colors$palette[2], 
    linetype = "dashed", linewidth = 0.5
  ) + 
  # Annotate
  annotate(
    "text", x = us_median, y = 44, 
    label = sprintf("National median: %.1f%%", us_median),
    hjust = 0, vjust = -1, size = 3, color = colors$palette[1], 
    fontface = "italic", angle = 90
  ) +
  # Scales
  scale_x_continuous(
    limits = c(0, max(state_cvd$upper_bound) + 0.7),
    labels = label_percent(scale = 1, suffix = "%"),
    breaks = seq(0, 8, by = 2),
    expand = expansion(mult = c(0.01, 0.1))
  ) +
  scale_y_discrete(
    expand = expansion(mult = c(0.01, 0.02))
  ) +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text,
    x = "Cardiovascular Disease Prevalence (%)",
    y = NULL
  ) +
  # Theme
  theme(
    plot.title = element_text(
      size = rel(1.45),
      family = fonts$title,
      face = "bold",
      color = colors$title,
      margin = margin(t = 5, b = 5)
    ),
    plot.subtitle = element_text(
      size = rel(0.95),
      family = fonts$subtitle,
      color = colors$subtitle,
      lineheight = 1.1,
      margin = margin(t = 5, b = 14)
    ),
    plot.caption = element_markdown(
      size = rel(0.65),
      family = fonts$caption,
      color = colors$caption,
      lineheight = 0.65,
      hjust = 0.5,
      halign = 0.5,
      margin = margin(t = 10, b = 5)
    ),
  ) 
```

#### 7. Save

```{r}
#| label: save
#| warning: false

### |-  plot image ----  
save_plot(
  p, 
  type = "30daychartchallenge", 
  year = 2025, 
  day = 26, 
  width = 8, 
  height = 10
  )
```

#### 8. Session Info

::: {.callout-tip collapse="true"}
##### Expand for Session Info

```{r, echo = FALSE}
#| eval: true
#| warning: false

sessionInfo()
```
:::

#### 9. GitHub Repository

::: {.callout-tip collapse="true"}
##### Expand for GitHub Repo

The complete code for this analysis is available in [`30dcc_2025_26.qmd`](https://github.com/poncest/personal-website/blob/master/data_visualizations/TidyTuesday/2025/30dcc_2025_26.qmd).

For the full repository, [click here](https://github.com/poncest/personal-website/).
:::


#### 10. References
::: {.callout-tip collapse="true"}
##### Expand for References

1. Data Sources:
   - CDC’s Behavioral Risk Factor Surveillance System (BRFSS) [BRFSS: Prevalence of Cardiovascular Disease](https://data.cdc.gov/Behavioral-Risk-Factors/BRFSS-Graph-of-Current-Prevalence-of-Cardiovascula/gfhd-2f5y)
  
:::

© 2024 Steven Ponce

Source Issues